Critic Learning-Based Safe Optimal Control for Nonlinear Systems with Asymmetric Input Constraints and Unmatched Disturbances

In this paper, the safe optimal control method for continuous-time (CT) nonlinear safety-critical systems with asymmetric input constraints and unmatched disturbances based on the adaptive dynamic programming (ADP) is investigated. Initially, a new non-quadratic form function is implemented to effectively handle the asymmetric input constraints. Subsequently, the safe optimal control problem is transformed into a two-player zero-sum game (ZSG) problem to suppress the influence of unmatched disturbances, and a new Hamilton–Jacobi–Isaacs (HJI) equation is introduced by integrating the control barrier function (CBF) with the cost function to penalize unsafe behavior. Moreover, a damping factor is embedded in the CBF to balance safety and optimality. To obtain a safe optimal controller, only one critic neural network (CNN) is utilized to tackle the complex HJI equation, leading to a decreased computational load in contrast to the utilization of the conventional actor–critic network. Then, the system state and the parameters of the CNN are uniformly ultimately bounded (UUB) through the application of the Lyapunov stability method. Lastly, two examples are presented to confirm the efficacy of the presented approach.


Introduction
Safety-critical systems are those that, in case of accidents or failures, can result in significant consequences, including but not limited to injuries, loss of life, environmental harm, or financial losses. The emergence of safety-critical systems like unmanned aerial vehicles (UAVs) [1][2][3] and robots [4] has led to an increased focus on safety control design within the field of control systems [5,6]. Safety control designs entail control strategies that satisfy safety specifications imposed by environmental limitations or physical limitations of the system. Ignoring the detrimental impact of safety entails substantial risks to both the safety of belongings and personal security. To address the challenges of the safe controller design, researchers have provided some effective approaches [7][8][9][10][11]. The problem of safety in the presence of unmodeled dynamics or disturbances in drones has recently been addressed by designing the robust controller based on the nonlinear estimator in [9]. In ref. [10], the use of neural networks integrated with the Lyapunov theory was preliminarily treated with application in the automotive sector for critical situations, and this aspect was further addressed in an even more organic way. In ref. [11], the quadratic programmingbased method was applied to develop a safe controller. Despite the fact that this method can guarantee safety at a local level for every time step, selecting a step size that is too small leads to redundant computations. In contrast, a step size that is too large causes unsafe behavior, making it challenging to ensure the safety of the system. Hence, it is crucial 1.
Asymmetric input constraints are considered in the control problem of the CT nonlinear safety-critical systems. In addition, this paper proposes a new non-quadratic form function to address the issue of asymmetric input constraints. It is important to note that when applying this approach, the optimal control policy no longer remains at 0, even when the system state reaches the equilibrium point of x = 0 (see u * (x) in later Equation (15)).

2.
This paper adopts the CBF to construct safety constraints and proposes designing a damping coefficient within the CBF to balance the safety and optimality of safetycritical systems based on varying safety requirements in different applications. 3.
The safe optimal control problem is turned into the ZSG problem to address unmatched disturbances; then, the optimal control law is gained by tackling the HJI equation using one CNN. Moreover, the use of only one CNN to approximate the HJI equation is an effective way to reduce the computational burden compared to the actor-critic network and the system state, and CNN parameters are demonstrated to be UUB.
The following structure is adopted for this article. Section 2 provides the initial formulation of the problem. Section 3 presents a safe optimal control design for the twoplayer ZSG problem. Then, in Section 4, an adaptive CNN method for addressing the HJI equation using an online method is proposed, and its stability is verified. Section 5 introduces two examples to demonstrate that the presented approach is effective. Lastly, Section 6 gives conclusions.

Problem Statement
Consider the CT nonlinear safety-critical system aṡ where x = [x 1 , x 2 , . . . , x n ] T ∈ C a ⊆ R n indicates the system state vector with n-dimensional parameters, F (x) ∈ R n represents the internal dynamics, G(x) ∈ R n×m and P (x) ∈ R n×q indicate control and disturbance coefficient matrices, respectively. Additionally, u ∈ R m denotes an input variable with m-dimensional parameters denoted by u = {u|u max ≥ u ≥ u min }, where u max and u min stand for the upper and lower bounds, respectively. And v ∈ R q is the unmatched disturbances. The paper assumes F (·), G(·), P (·) are Lipschitz continuous and satisfy F (0) = 0, and the safety-critical System (1) is stabilizable and controllable. Moreover, we assume there exist two constants G M > 0 and P M > 0. Both G(x) and P (x) have upper bounded values, i.e., G M ≥ G(x) , P M ≥ P (x) , for any x ∈ R n . In addition, it is essential to emphasize that C a represents a safe set for (1). C a is derived from operational restrictions, such as the allowable states of the robot arm, which is mathematically determined by where z(x) represents continuous concerning x. The set int(C a ) denotes the interior of C a , while ∂C a represents the boundary of C a .
Subsequently, the representation of the infinite horizon cost function from t = 0 for System (1) is given by where Q represents a function with positive definite properties, v 2 = v T v, Υ > 0 represents a constant weight coefficient, U(u) is a non-quadratic form function employed for handling the asymmetric input constraints determined by with Ψ and defined as where |u max | = |u min | and tanh(z) = (e z − e −z )/(e z + e −z ) with z ∈ R.

Remark 1.
Even though tanh(z) is symmetric, U(u) in (4) generates asymmetric constraints in the control signal u * (x) (see u * (x) in later (15)). This is due to the fact that is not equal to 0 in (4). This feature is different from studying the symmetric input constraints.
Additionally, the ultimate objective of this paper is to devise the safe and optimal control input policy for (1), which involves the utilization of the CBF concept. In the upcoming section, this paper presents the concept of the CBF and proposes an ADP-based approach to design the safe and optimal controller.

Safe Optimal Control Design
This section presents a detailed explanation of the concept of the CBF. Then, the safe and optimal control problem is converted to the two-player ZSG to overcome the unmatched disturbances, and the CBF is integrated with the cost function without an intermediary to punish unsafe behavior.

Control Barrier Function
The utilization of the CBF provides a solution to address the safety constraint problem in safety-critical systems. The CBF is a function that is non-negative within the set C a and exhibits divergence to infinity at the edge of C a . As the state x is about to reach the boundary of C a , the condition of negative derivative can bring the system state x back within C a , ensuring that the system state is always confined within C a . To better illustrate the properties of the CBF, the following assumption is given. Assumption 1. The CBF candidate B r (x) meets the subsequent three characteristics [40,41]: Moreover, for all x ∈ C a , the CBF B r (x) has the following properties: where γ 1 (·), γ 2 (·), and γ 3 (·) are class K functions. Under the premise that Assumption 1 and Equation (6) both hold, a suitable choice for B r (x) is ρy(x)/z(x), where y(x) represents a special scheduling function determined by the user to allow for flexibility in selecting B r (x). Specifically, y(x) ensures that the CBF operates only when the system is close to the unsafe set. ρ > 0 is the damping factor used to balance safety and optimality.

Remark 2.
In contrast to the previous CBF [16], the ρ chosen here shows a positive correlation with the value of B r (x). The larger the value of ρ, the faster the system state moves away from the unsafe set, and the smaller the value of ρ, the slower the state x moves away from the unsafe set. A smaller value of ρ emphasizes optimality and a larger value of ρ enforces safety.

Safe and Optimal Control Approach
By augmenting the selected CBF B r (x) to the cost function (3), a new refined cost function is obtained, that is, Remark 3. To ensure the safety of the system, it is assumed that the original system state x is confined within the set C a . This is because the rapid increase in B r (x) as the state x nears the boundary of C a is the reason behind the penalization of state convergence behavior when the initial state is beyond C a . This prevents the system state from converging.
The conventional control problems can be transformed into two-player ZSG problems. The Nash equilibrium point, i.e., the saddle point (u * ,v * ) can be obtained by addressing the special HJI equation. Then, the optimal cost function is defined by The purpose of the two-player ZSG problem is to identify a saddle point so that the following inequality can hold: Therefore, for the two-player ZSG problem, u * is the optimal control input policy minimizing the cost function, and v * represents the worst disturbance input policy maximizing the cost function.

Definition 1.
Input policy u is considered admissible in relation to (7) on ∈ R n , denoted by u ∈ ℵ( ), u stabilizes (1) on if u is continuous on , and (7) is limited for any x ∈ .
For the admissible input policy u ∈ ℵ( ), if Equation (7) is continuously differentiable, computing the gradient of V(x) with respect to t on both sides of Equation (7) yields the nonlinear Lyapunov equation as Based on the optimal control approach, the HJI equation for the two-player ZSG problem possesses an exclusive solution if there exists a saddle point, that is, if the following conditions hold: where H(x, u, v, ∇V * (x)) refers to the Hamiltonian function of the safety-critical system (1), that is, By using Equations (11) and (12), the saddle point can be found by addressing two equations as and Thus, the saddle point (u * , v * ) can be gained as and where ψ = [ , , . . . , ] T ∈ R m with given by Equation (5).
Substituting Equations (15) and (16) into Equation (11), the HJI equation can be redefined as where For the optimal safe control problem of the ZSG with unmatched external disturbances and asymmetric input constraints, it is necessary to obtain the value corresponding to the optimal cost Function (8) for achieving the optimal control input Policy (15) and the worst disturbance input Policy (16). Therefore, the solution of Equation (17) needs to be obtained. Nevertheless, since Equation (17) represents a nonlinear partial differential equation, it is challenging to find its analytical solution using conventional mathematical approaches. Hence, the solution of this equation is estimated by using the CNN in the next section.

Solving the HJI Equation via the CNN
This section designs a CNN to estimate cost function V * (x) as where ξ(x) represents the estimation error about the CNN with ξ(0) = 0, W c ∈ R r represents the ideal weight vector of the CNN, δ(x) = [δ 1 (x); δ 2 (x); . . . ; δ r (x)] represents activation function with δ j (0) = 0, j = 1, 2, . . . , r, r is the number of neurons in the CNN. The gradient of the approximate optimal cost function is Substituting Equation (19) into Equation (15), u * (x) can be represented as whereĀ and with Φ(A(x)) = diag tanh 2 (A l (x)) (l = 1, 2, . . . , m) with A l (x) = [A 1 (x); A 2 (x); . . . ; A m (x)] ∈ R m being selected betweenĀ(x) and T(x). Then, considering Equation (19), v * (x) in Equation (16) can be redefined as where . Similarly, substituting Equation (19) into Equation (17), the HJI equation can be rewritten as where K(x) = 1/(2Ψ)G(x) T ∇ξ(x). However, since the ideal CNN weight W c in Equation (18) is unknown, it can not be used in the control procedure. Hence, the CNN is used to estimate the cost function and its gradient asV whereŴ c represents the estimation of W c . Therefore, the approximate optimal input and the approximate worst disturbance input becomeû andv Subsequently, the approximated Hamilton function can be formulated bŷ and The CNN weight estimation error is denoted bỹ and the approximation error c of the Hamiltonian function is derived as To achieveŴ c → W c , it is necessary to ensure that c → 0. Therefore, the chosen target function is denoted by Consequently, based on a normalized gradient descent algorithm, the weight vectorŴ c is defined bẏŴ with α > 0 being the adjustable parameter and c defined as Equation (33). Using Equations (32) and (34), the weight approximation error˙W c can be expressed aṡW where

Stability Analysis
The UUB of both the state x and the CNN parameters in the closed-loop system is demonstrated by utilizing the Lyapunov stability analysis principle in this subsection. First, two assumptions that were also used in [28,42] are required, as Assumption 2. The ideal optimal CNN weight vector W c is upper bounded, i.e., W c ≤ b W c , where b W c > 0 is a constant. Moreover, for any x ∈ , this paper assumes that there are two known constants Proof. We let the Lyapunov candidate function as the following (note: for convenience, V * (x) and (1/2)W T cWc are abbreviated as L 1 and L 2 below): Taking the derivation of L 1 in Equation (37) and using System (1), the derivation of L 1 can be expressed aṡ Then, using Equations (12) and (11), it can be derived as Similarly, taking into account Equations (27) and (28), the derived results are and According to Equations (38)-(41), Equation (38) can be rewritten as follows (note: for convenience,ω − U(u * ) and 2Υ 2 v * Tv * − Υ 2 v * 2 − B r (x) are abbreviated as Λ 1 and Λ 2 below):L We apply Young's inequality to Equation (43). Additionally, considering Equations (19), (20), (27), (40) and (41),ω can be formulated as Furthermore, utilizing Young's inequality,ω in Equation (44) further yields ω ≤2 −Ψtanh(Γ(x)) + Ψtanh(Ā(x)) 2 + 2 ξ u * (x) 2 According to Equations (21) and (31), the following inequalities can be depicted as and Based on Equation (46) and Assumptions 2 and 3,ω can be expressed as By observing Equations (4) and (5), it can be concluded that U(u * ) > 0. Using Young's inequality and Equation (48), the expression of Λ 1 in Equation (42) can be rewritten as Similarly, Λ 2 in Equation (42) can be rewritten as follows (note: from Assumption 1, B r (x) ≥ 0): (50) Meanwhile, using Young's inequality and Assumptions 1 and 3, Λ 2 in Equation (50) further yields Hence, by observing Equations (49) and (51), it can be inferred thatL 1 in Equation (42) Then, the derivative of L 2 in Equation (37) along the solution of Equation (34) is as follows (note: αW T c (ζ/O)ξ c is abbreviated as Λ 3 below): Immediately after, using Young's inequality, Λ 3 can be depicted as Additionally, with Assumption 3 holding, it can be deduced thatL 2 in Equation (53) satisfiesL Using Equations (37), (52) and (55),L can be depicted aṡ Finally,L < 0 is true if x / ∈ (x) orW c / ∈ (W c ), and based on Equation (36), (x) and (W c ) can be respectively formulated as and where ∇δ . To summarize, the Lyapunov stability method has been used to demonstrate the state x of Equation (1) andW c are UUB, with Equations (57) and (58) representing their respective bounds. The proof is complete.

Simulation Study
Within this section, two examples are utilized to validate the efficacy of the proposed approach.

Example 1
Consider the F16 aircraft plant used in [28] aṡ where x(t) = [x 1 , x 2 , x 3 ] T ∈ R 3 with x 0 = [1, −1, 1] T represents the system state vector, where x 1 , x 2 and x 3 represent the attack angle, the pitch rate, and the elevator deflection angle, respectively. u is control input, v is disturbance input. The internal dynamics, control, and disturbance coefficient matrices are expressed as The control input u is constrained to be greater than −1 and less than 2. Hence, Ψ = 1.5 and = 0.5. And then, the danger region is described as a ball with a radius of 0.15 and a center at [0.3, 0.05, −0.05] T . The y(x) is chosen as The z(x) is chosen as In addition, substituting Ψ and into Equation (4), U(u) can be expressed as Letting Q = I 3 and Υ = 2, the cost function for Equation (62) is formulated as where B r (x) = ρ y(x) z(x) represents the CBF and ρ = 2. The activation function is given as δ(x) = [x 2 1 , x 1 x 2 , x 1 x 3 , x 2 2 , x 2 x 3 , x 2 3 ] T and the CNN weight vector isŴ c = [Ŵ c1 ,Ŵ c2 ,Ŵ c3 ,Ŵ c4 ,Ŵ c5 ,Ŵ c6 ] T . In addition, the adjustable parame-ter α is 10, and the original parameters of the CNN are configured as 1. At last, the probing noise exp(−0.1t)(0.001)(sin(t) 2 cos(t) + sin(2t) 2 cos(0.1t)) is added to the control input policy for the initial 30 s in order to ensure the persistence of the excitation.
Through simulation experiments, Figures 1-7 are obtained. Figure 1 displays thatŴ c is convergent after the first 10 s, and can know the ideal vector W * c = [16.4603, −6.5022, −4.3910, 4.8851, 3.7081, 11.6158] T . Figure 2 displays the convergence of the states x 1 , x 2 , and x 3 . Figure 3 displays the danger region, which is represented by the ball, and the original states are in the danger area. However, the system states controlled by the safe optimal controller bypass this ball, and as the damping coefficient ρ increases, the distance between the system states and the dangerous region becomes larger and larger. Figure 3 shows that as states x 1 , x 2 , and x 3 gradually approach the danger zone, the convergence of x 3 is accelerated due to the CBF and cost function. Figure 4 presents the control input u with asymmetric input constraints. The plot reveals that the value of u remains within the specified range, bounded by u max = 2 and u min = −1, providing evidence that the asymmetric input constraints are implemented successfully. Figure 5 presents the disturbance input v. Figure 6 presents the cost function of the system. It can be seen that when the system states confront the danger area, the cost function changes significantly and eventually converges to zero. According to the principle of optimal control, when the cost function converges to zero, the following conclusion can be drawn: The cost function imposes a higher penalty on control actions that do not comply with the asymmetric input constraints and safety constraints. Therefore, when the cost function converges to zero, the system finds the optimal control actions that satisfy all the constraints.
In order to further show the efficiency of the presented method, Equation (4) is redefined as u T Ru (where R = I 1 ), and the simulation results are illustrated in Figure 7. Subsequently, Figure 4 illustrates the control input, which is restricted to the limits of −1 to 2. This can be observed by comparing it with Figure 7, where the input is clearly outside this range.

Example 2
We consider the nonlinear system aṡ where x(t) = [x 1 , x 2 ] T ∈ R 2 with x 0 = [1, −1] T represents the system state vector; the internal dynamics, control, and disturbance coefficient matrices are expressed as Just like F16, the control input u is subject to an asymmetrical boundary, with a lower bound of −1 and an upper bound of 3, establishing its limits. Hence, Ψ = 2 and = 1. And then, the danger region is described as a circle with radius = 0.1, and the center of the circle is [0.19, −0.12] T . The y(x) is chosen as ).

The z(x) is chosen as
In addition, substituting Ψ and into Equation (4), U(u) can be expressed as Letting Q = I 2 and Υ = 1.35, the cost function for Equation (62) is formulated as z(x) represents the CBF and ρ = 0.3. Then, the CNN presented as Equation (18) is applied to address the HJI equation for Equation (62). The activation function is given as In addition, the adjustable parameter α is 20, the original parameters of the CNN are configured as 1. At last, the probing noise exp(−0.001t)(−0.1(sin(t) 2 cos(t) + sin(t) 5 + sin(2t) 2 cos(0.1t) + sin(−1.2t) 2 cos(0.5t)) is added to the control input policy for the initial 30 s.
Through simulation experiments, Figures 8-14 are obtained. Figure 8 displays thatŴ c is convergent after the first 10 s, and can know the ideal vector W * c = [84.6487, −12.2017, 9.5269, 11.7425, −3.0924, 3.4273, −0.5533, 2.0591] T . Figure 9 displays the convergence of the states x 1 and x 2 . Figure 10 illustrates the relationship between the system states and the dangerous area, revealing that increasing the damping factor ρ leads to a greater distance between the system states and the dangerous zone. Evidently, system states x 1 and x 2 with a safe and optimal controller take an alternate route to avoid the dangerous region, while the conventional optimal controller cannot circumvent the dangerous region. As can be seen from Figure 10, when states x 1 and x 2 gradually approach the danger zone, the convergence speed of x 2 is accelerated due to the influence of CBF and cost function and obtains an optimal trajectory around the danger zone again. Figure 11 shows input u with asymmetric input constraints. The plot reveals that the value of u remains within the specified range, bounded by u max = 3 and u min = −1, providing evidence that the asymmetric input constraints are implemented successfully. Figure 12 presents disturbance input v. Figure 13 presents the cost function of the system. It can be seen that the cost function eventually converges to zero. Similar to the linear system, when the cost function converges to zero, it can be concluded that the system finds the optimal control action that satisfies the asymmetric input constraints and safety constraints.
In this paper, asymmetric input constraints and unmatched disturbances are applied to nonlinear safety-critical systems for the first time, and Equation (4) is used to handle the asymmetric input constraints. To further demonstrate the efficacy of the presented algorithm, as in articles [14,16,28], (4) is redefined as u T Ru (where R = I 1 ) and the simulation results are shown in Figure 14. Subsequently, the control input in Figure 11 is constrained to fall within the limits of −1 to 3, as can be observed by comparing it with Figure 14, while the input in Figure 14 is clearly outside this range.

Conclusions
The safe and optimal control problem of the nonlinear CT safety-critical systems with asymmetric input constraints and unmatched disturbances was addressed. Firstly, the new non-quadratic form function was considered for addressing the issue of asymmetric input constraints. Then, the control design was transformed into the two-player ZSG problem to handle unmatched disturbances. In order to obtain the optimal controller for safety, the combination of the CBF and cost function was directly used to penalize unsafe behavior. Moreover, the CNN was applied to reduce the computational complexity of dual actor-critic network. The effectiveness of the proposed method was validated by the simulation results.